Spoken cross-language access to image collection via captions
نویسنده
چکیده
This paper presents a framework of using Chinese speech to access images via English captions. The formulation and the structure mapping rules of Chinese and English named entities are extracted from an NICT foreign location name corpus. For a named location, name part and keyword part are usually transliterated and translated, respectively. Keyword spotting identifies the keyword from speech queries and narrows down the search space of image collections. A scoring function is proposed to compute the similarity between speech query and annotated captions in terms of International Phonetic Alphabets. The experimental results show that the average rank and the mean reciprocal rank are 2.04 and 0.8322, respectively, which is very close to the best performance, i.e., 1, for both average rank and mean reciprocal rank.
منابع مشابه
User experiments with the Eurovision cross-language image retrieval system
In this paper we present Eurovision, a text-based system for cross-language (CL) image retrieval. The system is evaluated by multilingual users for two search tasks with the system configured in English and five other languages. To our knowledge this is the first published set of user experiments for CL image retrieval. We show that: (1) it is possible to create a usable multilingual search eng...
متن کاملUnsupervised Learning of Spoken Language with Visual Context
Humans learn to speak before they can read or write, so why can’t computers do the same? In this paper, we present a deep neural network model capable of rudimentary spoken language acquisition using untranscribed audio training data, whose only supervision comes in the form of contextually relevant visual images. We describe the collection of our data comprised of over 120,000 spoken audio cap...
متن کاملLearning Word-Like Units from Joint Audio-Visual Analysis
Given a collection of images and spoken audio captions, we present a method for discovering word-like acoustic units in the continuous speech signal and grounding them to semantically relevant image regions. For example, our model is able to detect spoken instances of the words “lighthouse” within an utterance and associate them with image regions containing lighthouses. We do not use any form ...
متن کاملCaption vs. Query Translation for Cross-Language Image Retrieval
For many cross-language retrieval tasks, the predominant approach is to translate the query into the language of the document collection (target language). This often gives results as good as, if not better, than translating the document collection into the query language (source language). In this paper, we evaluate query versus document translation for the ImageCLEF 2004 bilingual ad hoc retr...
متن کاملCross-Language Image Retrieval via Spoken Query
This paper studies cross-language cross-medium information retrieval. We introduce several approaches to unify the languages and media of queries and documents. We experiment on cross-language image retrieval via spoken query. Two approaches are proposed to recognize and translate spoken queries. We also propose a similarity-based approach to identify and backward transliterate named entities i...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003